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Abstract 

We present a new inner bound for the rate region of the t-stage successive-refinement problem 
with side-information. We also present a new upper bound for the rate-distortion function for lossy- 
source coding with multiple decoders and side-information. Characterising this rate-distortion function 
is a long-standing open problem, and it is widely believed that the tightest upper bound is provided by 
Theorem 2 of Heegard and Berger's paper "Rate Distortion when Side Information may be Absent," 
IEEE Trans. Inform. Theory, 1985. We give a counterexample to Heegard and Berger's result. 
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I. Introduction 

One of the most important and celebrated results in multi-terminal information theory is Wyner 
and Ziv's solution to the problem of lossy source coding with side -information at the decoder [TTJ 
- the Wyner-Ziv problem (fig. [TJ. The main objective of this problem is to find a computable 
characterisation [2, Pg. 259] of the rate-distortion function R(d). This function describes the 
smallest rate at which the encoder can compress an iid random sequence X so that the decoder, 
which has side-information Y, can produce a replica X of X that satisfies the average distortion 
constraint 
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where 8 is a real- valued distortion measure (3j and E[-] is the expectation operation. In [1, Thm. 
1], Wyner and Ziv famously showed that 



R(d) = min {I{X; U) - I(U; Y)} 



(2) 



where the minimization is taken over all choices of an auxiliary random variable U that is jointly 
distributed with (X,Y) and which satisfies the following two properties: (1) U is conditionally 
independent of Y given X; and (2) there exists a function X(U, Y) with K5(X, X(U, Y)) < d. 

In this paper, we study the following two extensions of the Wyner-Ziv problem: (1) the 
Wyner-Ziv problem with multiple decoders (fig. [3]); and (2) the successive-refinement problem 
with side-information (fig. [4]). A brief history of the literature on these problems is as follows. 
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Fig. 1. The Wyner-Ziv Problem: (X, Y) = (X\, Yi), (X2, I2), • • ., (X n , Y„) is an iid random sequence emitted by a source 
q(x, y) = Pr[X = x, Y — y\. The encoder maps X to an index M, which belongs to a finite set at a rate r. Using M and 
Y, the decoder is required to generate a replica X = X\, X2, ■ ■ ■ , X n of X to within an average distortion d, according to |T](. 
The rate-distortion function R(d) is defined as the smallest rate for which such a reconstruction is possible. A single-letter 
expression for this function was first given in (T] Thm. 1]. 
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A. The Wyner-Ziv Problem with t-Decoders 

Suppose that the side-information Y in Figure [T] is unreliable in the sense that it may or may 
not be available to the decoder. If the encoder does not know a priori when Y is available, then 
Wyner and Ziv's coding argument for ([2]) fails, and a more sophisticated argument is required to 
exploit Y. This observation inspired Kaspi Q in 1980 (published by Wyner on behalf of Kaspi 
in 1994) as well as Heegard and Berger pi in 1985 to independently study the problem shown 
in fig. [2] - the Kaspi/Heegard-Berger problem. As with the Wyner-Ziv problem, the objective 
of this problem is to characterise the corresponding rate-distortion function R(di,d 2 ). That is, 
to find the smallest rate such that decoders 1 and 2 can produce replicas X x and X 2 of X to 
within average distortions d\ and d 2 , respectively. To this end, Heegard and Berger [5, Thm. 1] 
showed thaQ 

R(d u d 2 ) = min {/ (X; W) + I (X; U | Y, W) } , 

u,w 

where the minimization is taken over all choices of two auxiliary random variables, U and 
W, that are jointly distributed with (X,Y) and which satisfy the following two properties: (1) 
(U, W) is conditionally independent of Y given X; and (2) there exist functions Xi(W) and 
X 2 {Y, U, W) with E5(X, X X (W)) < d x and E5(X, X 2 (Y, U, W)) < d 2 , respectively. 

The Kaspi/Heegard-Berger problem in Figure [2] was further generalised by Heegard and Berger 
in [[5} Sec. VII] to the problem shown in Figure [3] There are t-decoders, each with different 
side-information, and the objective is to characterise the corresponding rate-distortion function 
R(d). Unfortunately, this function has eluded characterisation for all but a few special cases. For 
example, Heegard and Berger |5} Thm. 3] have characterised R(d) for stochastically degraded 
side-informatioij^J Tian and Diggavi j6|, Q have characterised R(d) for a quadratic Gaussian 
source with jointly Gaussian side-information; and Sgarro's result |(8l Thm. 1] subsumes the 
corresponding lossless problem. Notwithstanding this difficulty, however, this problem has helped 
stimulate a number of important results 0> @, (TO)- 

'Kaspi's result, J4I Thm. 2], gives an alternative characterisation of R(di,d,2) that uses one auxiliary random variable. 
2 The joint probability distribution of (X, Y\, Y2, ■ ■ ■ , Y t ) can be manipulated to form the Markov chain X-e-Yt-e-Yt-i-e- ■ -e-Yi 
without altering R(d). We discuss this problem in detail in Section II-C 
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Fig. 2. The Kaspi/Heegard-Berger Problem: The encoder compresses X in a manner suitable for two decoders - one of which 
has side-information Y. The rate-distortion function R(d) defines the smallest rate at which decoders 1 and 2 can generate 
replicas Xi = -X^i, Xi,s, • • • , -Xi >n and X2 = -X2,i, -^2,2, • • ■ , X2, n of X to within average distortions dj and da, respectively. 
A single-letter expression for this function was independently given in |4| and |5|. 



In (5j Thm. 2], Heegard and Berger claimed that a certain functional, Ro(d), is an upper 
bound for R(d). (The expression for -Ro(d) is given in equation of Section |n| however, this 
expression requires notation from Section [nj) For twenty-five years, Ro(d) has been universally 
considered to be the tightest upper bound for R(d) in the literature. In Example [3] of Section [TTJ 
we present a counterexample to pi Thm. 2] that shows Ro(d) is not an upper bound for R(d). 
The invalidity of pi Thm. 2] is by no means obvious as it involves a difficult minimization 
over (2* — l)-auxiliary random variables. Indeed, we note that this theorem has been cited with 
modest frequency in the literature, and all the while this error appears to have gone unnoticed. 
We present a new upper bound for -R(d) in Theorem [5] of Section IV 



B. The Successive-Refinement Problem with Side -Information 

The aforementioned counterexample led us to study the t-stage (or, t-decoder) successive- 
refinement problem shown in Figure [4] The encoder maps X to t indices: Mi, M 2 , . . . , M t . It 
is required that decoder I uses indices Mi through Mi together with its side-information Y; to 
produce a replica X/ = X/ i, X^, . . . , Xi >n of X to within an average distortion dj. The objective 
of this problem is to characterise the resulting admissible-rate region ^(d). That is, to determine 
the set of all rate tuples r = (r 1; r 2 , . . . , r t ) for which each decoder can reconstruct X to within 
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Fig. 3. The Wyner-Ziv problem with i-decoders. The encoder compresses X in a manner suitable for i-decoders - each of 
which has different side-information. The rate-distortion function i?(d), where d = (di, d.2, . . . , d t ), defines the smallest rate 
at which decoder I, for all I = 1, 2, . . . , t, can generate a replica X; of X to within an average distortion di. This problem is 
open for t > 2. We present an upper bound for R(d) in Theorem [2] 



its desired distortion level. 

Assuming the side-information is stochastically degraded, Steinberg and Merhav [|9) charac- 
terised ^(d^d-i) for t = 2 decoders. Shortly thereafter, Tian and Diggavi (6| extended this 
problem to t-decoders and proved the following result. 

Proposition 1: If the side-information is stochastically degraded, then M(d) is equal to the 
set of all rate tuples r for which there exists t auxiliary random variables U\, U 2 , ■ ■ ., U t such 
that 

i i 

k=l k=l 

for all / = 1, 2, . . . , t, where 

1) (Ui, U2, • • • , U t ) is conditionally independent of (Yj., Y 2 , . . . , Y t ) given X; and 

2) there exist t functions Xi{U h Y t ), I = 1, 2, . . . , t, with E5 t (X, X{U h Y t )) < d h 
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Fig. 4. The successive-refinement problem with f stages. The encoder compresses X in f-stages. At stage I, decoder I generates 
a replica X; of X. This problem is open for t > 2. We present an inner bound for the admissible-rate region in Section 
(Theorem [T|. 



Ill 



More recently, Tian and Diggavi [7] gave the following non-trivial inner bound for M(dx,d 2 ) 
under the assumption that X and Y 2 are conditionally independent given Y\ - the scalable side- 
information source coding problem. Note, this conditional independence is the reverse of the 
stochastic degradedness used in Proposition [T] 

Proposition 2: If X and Y 2 are conditionally independent given Yx, then a rate pair (ri,r 2 ) 
is (di, o?2)-admissible if there exists three auxiliary random variables, Ux 2 , U\ and U 2 , such that 

n>l{X;U 1 ,U 12 \Y 1 ) 
n+r 2 > l(X; U 2 , U 12 \Y 2 ) +l(X;U l \Y 1 , U l2 ) , 

where 

1) (Ui2, Ui, U 2 ) is conditionally independent of (Yx, Y 2 ) given X; 

2) there exist functions Xx(Ux,Yx) and X 2 (U 2 ,Y 2 ) such that ESx(X, Xx(Ux, Yx)) < d x and 
E5 2 (X, X 2 (U 2 ,Y 2 )) < d 2 . 
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We present a new inner bound for M(d) for the general t-decoder problem with arbitrarily 
correlated side-information in Theorem Q] of Section | 



C. Paper Outline & Notation 

In Section [nj we formally define M(d) and R(d) and give the counterexample to [5, Thm. 



2]. In Sections III and IV we respectively present new achievability results for M(d) and R(d). 
We describe a new lossless source coding problem in Section |V} and the paper is concluded in 
Section |VD 

The non-negative real numbers and the natural numbers are written as M+ and N, respectively. 
For s,t E N with s < t, we let [s,t] = {s, s + 1, s + 2, . . . , t}. When s = 1, we drop s, 
i.e. [t] = {1,2,...,/;}. Proper subsets and subsets are identified by C and C, respectively. 
Random variables and random sequences are identified by upper case and bolded uppercase 
letters, respectively. For example, X = Xi, X 2 , . . . , X n denotes the random sequence to be 
replicated at the decoders, and = Y til , Yi )2 , . . . , Yi n denotes the side-information at decoder 
I. The letter U is always used to represent auxiliary random variables. The alphabets of random 
variables are identified by matching calligraphic typeface, e.g. S£ and ^ are the respective 
alphabets of X and U . A generic element of an alphabet is identified by a matching lowercase 
letter, e.g. x E 3£ and net. The Cartesian product operation is denoted by x, e.g. 3£ x <3f . 
The t-fold Cartesian product of a single alphabet/set is identified with a superscript, e.g. S£ l 
and R*p Tuples from product spaces are identified by boldfaced lowercase letters, e.g. x = 

(xi,x 2 , ...,x n )e % n . 

For notational convenience, the same letter is used to represent a joint pmf and its marginals, 
e.g. if (X, Y) on x W is defined by p(x, y) = Pr[X = x,Y = y], then p(x) = J2 x e % p( x i y)- 
The symbol -e- is used to denote Markov Chains, e.g. if (X, Y, Z) on i?f x W x 5° is defined 

by p(x, y, z) = Pr[X = x,Y = y, Z = z] where 

p(x, y)p(y, z)/p(y), if p(y) > 
0, otherwise, 
then we write X -e- Y -e- Z [p]. Mutual information and entropy are written in the standard 
fashion [|3j using / and H, respectively. We sometimes use subscripts for I and H to emphasize 
that random variables under consideration are defined by a particular pmf, e.g. if (X, Y) is 
defined by p(x, y) = Pr[X = x,Y = y), then we write I P (X; Y). 
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II. Definitions & Counterexample 

A. Successive Refinement with Side -Information 

Consider Figure Q Let 3£ , W\, . . ., be finite alphabets and set ^* = x ^> x • • • x W t . 
Let 

(X, Yi, Y 2 , . . . , Y t ) = | (Xj, Yi^, Y2,;> • • • , i*,<)} 

denote n (t + 1) -tuples of random variables that are drawn in an iid manner from 3£ x 
according to a generic pmf q, where 

q(x,y 1 ,...,y t ) = Pr [X x = x h Y x = y u . . . , Y t = y t ] . 

We assume that X = Xi,X%, . . . ,X n is known to encoder and Y/ = Yi t i, Yi^, ■ ■ ■ , Yi, n is 
known to decoder I. The encoder compresses X with 

y(n) : S£ n x J% 2 X • • • X J% t , 

where „# 2 , • • •> ^ are finite sets. The resulting t indices 

(M 1 ,M 2 ,...,M t ) = / (n) (X) 

are sent over channels 1 through t, respectively. The rate of the encoder on channel / (in bits 
per source symbol) is given by 

/> } 4 _iog 2 \JC^ , 

where |^#/| is the cardinality of 

Consider decoder I. Let ^ be a finite reconstruction alphabet, and let 

St : 3£ x St x -> M + 

be a per-letter distortion measure. Observe that and 5; can be different to those used at the 
other decoders. We assume that 5i is normaQ in sense that Si(x,x*(x)) = for all x G Jf, 
where 

x*(x) = argmin x) . 

3 It is possible to remove this assumption and extend the results of this paper to general reconstruction alphabets and per-letter 
distortion measures using the procedure given in 1111 Sec. 9.1]. 
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This decoder is required to generate a replica X/ = Xi t i, X Z 2 , ■ ■ ■ , X^ n of X using 

g\ n) : Jf x x Jt 2 ■ ■ ■ x Jt x x ^ n -)■ ; 

that is, 

X, = ft (n) (Mi,M 2 ,...,M,,Y,) . 
Finally, the quality of this replica is measured by the average distortion 



A[ n) 4 E 



1 n 
n ' 

8=1 



Definition 1 (d-Admissible Rates): Suppose d = (di, d 2 , ■ ■ ■ , G K+. A rate tuple r = (r l5 
7*2, ■ ■ •» Tt) G IR+ is said to be d- admissible if, for arbitrary e > 0, there exists an n t G N, an 
encoder and t-decoders g x , g 2 , ■ ■ ■, g^ such that 

h + e > A\ ne \ V/ G [t], and 

r/ + e>^ K) , \/le[t). 

We let ^(d) denote the set of all d-admissible rate tuples. 

We note that Definition [I] matches Tian and Diggavi [|6j in that the I th channel (or, refinement) 

(n) 

rate k\ is characterised in an individual (or, incremental) manner. In contrast, Steinberg and Mer- 



hav [|9j define the I th refinement rate in a cumulative manner, e.g. (1/n) \og(\^i\\^ 2 \ ■ ■ ■ \<^i\)- 
We also note that &{d) is dependent on the successive-refinement decoding order j7j. That is, 
if we interchange decoders (keeping the same side-information and distortion constraints at each 
decoder), then will change. 

We conclude this section with a summary of some fundamental properties of M(d). These 
properties can all be deduced directly from Definition [TJ See [[6), (9j, [12|-[14| for similar 
discussions. 

Proposition 3: The region M(d) is completely defined by the pair-wise marginal distribu- 
tions of X with each side-information. Let q' and q" be pmfs on 1" x W* , and let M(d)[q'] 
and &(d)[q"] denote their respective d-admissible rate regions (assuming the same distortion 
measures). If q'(x, y t ) = q"(x, yi) for all (x, y t ) G 3C x % and / G [t], then M(d)[q'} = M{d)[q"\. 
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Proposition 4: The region M(d), for every d G R+, is a closed convex subset of that is 
uniquely determined by its lower boundary 

{r e M{&) : Vf g n < n ,v/ g [t] n = n vz g [t]} . 

Proposition 5: The region ^(d) is sum incremental in the sense that rate can always be 
transferred from higher-index channels to lower-index channels. If r G &(d), then 

M{d) D Jgf(r) 4 if g M' + : ^f fe > V/ G [t]| . (3) 

[ fc=i k=i J 

We note in passing that Proposition [5] also holds in a more universal setting. Suppose r G M^. 
Consider all combinations of the source distribution, distortion measures and distortion tuple 
(e.g., JT, W* , q, {5i}j =1 and d G R^) such that the resulting d-admissible rate region ^(d)[g] 
contains r. The proposition shows that J? (r) is an inner bound for every such region. In addition, 
it can be shown that Jz?(r) is maximal in the sense that J2?(r) = &(d){q\ for some choice of 
Jf, q, {5i} and d. Therefore, the d-admissibility of f ^ ~£?(r) cannot be inferred from the 
d-admissibility of r without specific consideration of the source distribution, distortion measures 
and distortion tuple. For this reason, J?f (r) can be called the latent admissible rate region implied 
by r. See, for example, p"4| . 

We give an inner bound for ^(d) in Theorem [T] of Section III However, before giving this 



bound, it is useful to formally define the rate-distortion function R(d) (fig. [3]) and then review 
Heegard and Berger's functional i?o(d). 



B. Rate Distortion with Side-Information at t-decoders 

The rate-distortion function -R(d) for the problem shown in Figure [3] can be efficiently 
recovered from ^(d) by restricting the code rate on channels 2 through t to be zero. 

Definition 2: The rate-distortion function for lossy source coding with side-information at 
t-decoders (fig. [3]) is defined by 

R(d) = mm{r G R+ : (r, 0, 0, • • • , 0) G M{d)} , 

where the indicated minimum exists because ^(d) is closed and bounded from below. 
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It should be noted that Definition [2] technically permits the use of codes with asymptotically- 
vanishing rates on channels 2 through t. That is, the d-admissibility of rates approaching R(d) 
from above can be proved using a sequence of codes where — > and k 1 h — > for all 
I E [2,t\. Such codes, however, are not permitted in the single-channel rate-distortion problem 
(fig. p|); we can only use codes with = for all / E [2, t\. Despite this subtle difference, 
Definition [2] is equivalent to the definition used in [|5) because any message transmitted on 
channels 2 through t can be transferred^ to channel 1 (see Proposition [5]). 



As mentioned in Section II-A| M(d) depends on the successive-refinement decoding order. 



This dependence, of course, is not shared by R(d). Indeed, the aforementioned rate-transfer 
argument can be used show that the decoding order (used to define ^(d) in Definition [2]) can 
be interchanged with any other decoding order without altering R(d). 

Using the time-sharing principle, it can be shown that R(d) is convex on W + . This convexity 



ensures that R(d) is continuous on the interior of IR+ [16 Thm. 10.1]. Moreover, it can also be 
verified that R(d) is continuous whenever d\ = for some I E [t]; see, for example, |TJ Pg. 2]. 

Proposition 6: The rate-distortion function -R(d) is continuous, non-increasing (i.e., -R(d) < 
R(d) when di > di for all I G [t]) and convex on Ei. 

The following proposition for lossless reconstructions can be obtained as an extension to the 
Slepian-Wolf Theorem JT7} Thm. 2], a variant of a more general result by Sgarro [8, Thm. 2], 



or a special case of Bakshi and Effros [18 Thm. 1]. 

Proposition 7: If, for every I E [t], S£i — 3£ and 5i satisfies 

Si(x, x) = and 
5i(x, x) > 0, x 7^ x , 

then 

R(0, 0, . . . , 0) =m&xH(X\Y l ) . 

ie[t] 

To review Heegard and Berger's work on -R(d) for generic distortion tuples, we first need to 
define (2* — l)-auxiliary random variables - one for every non-empty subset of decoders. For 

4 In general, it is difficult to prove the equivalence of asymptotically-vanishing rates and zero-capacity channels (i.e. "deleting 
the channel") without such a rate-transfer argument. See, for example, 1151. 
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this purpose, arrange the non-empty subsets of [t] into a list J^i, S" 2 , . . . , ^2*-i (the ordering is 
not important). For each j e [2* — 1], let be a finite alphabet. Define ty* = l &y 1 x x 

• • • x ^> t . Let ^ denote all those pmfs p on ^* xf xf* whose (JT x ^*)-marginal is 
equal to the source distribution q: 

p(x,y 1 ,...,y t ) = ^2 p(ui,u 2 ...,u 2 t-i,x,y 1 ,...,y t ) 

(Ml,M 2 ...,M 2 t_ 1 )e t ^* 

= q(x,yi,y 2 , ...,y t ) ■ 

Each p G ^ specifies a joint pmf for (2* — l)-auxiliary random variables. We denote these 
variables by Uy x , Uy 2 , . . ., Ujr t , where XJ&. takes values from ^> . Let =2/ = {t/j^, t/y 2 , 

• • •> ^at.J. and let 

^ = {''/, C •</: J^D^} 

denote those auxiliary random variables associated with supersets of <5^. 

Let ^(d) denote the set of all p E for which the following two properties are satisfied: 
(PI) p factors to form the Markov chain: 

(U Yl ,Uy 2 ,. . . , Uy 2t J e X e (y ls y 2 , . . . , V t ) [p] ; and 

(P2) for every decoder I e [t] there exists a function -X"z(Y/, Ugy, ^jn) with 

E^(x,l^,[/ W ,^ } )) <d, . 

Heegard and Berger claimed [5, Thm. 2] that the functional 

2 f -l 

^( d ) = ™? d) E ^{X; Uy 3 \^, Yl ) (4) 

is an upper bound for R(d) for all finite alphabets < 2f> 1 , ^> 2 ' • • •> ^H'-i sucn ^(d) is 
non-empty. In the next two examples, we confirm that Ro(d) is an upper bound for R(d) when 
there is one or two decoders (t = 1 or 2); however, in the third example we show that Ro{d) is 
not an upper bound for R(d) when there is three or more decoders (t > 3). 

For brevity, we drop set notation for each auxiliary random in the following three examples. 
For example, we write U\, U 12 and Ui 23 in place of f7{i}, ^{1,2} and U^^y, respectively. 
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Example 1: If t = 1, then Q reduces to 

Ro(di)= min ^(X^lYl) 

= pe I S5 1 ){ /p(X;f/l) " /p(f/i;ri) }' (5) 

where the equality in (|5]) follows from the chain rule for mutual information and the Markov 
chain U\ -e- X -e- Y\ [q\. If the cardinality of Wy l is limited to \ < \3£\ + 1, then the right 
hand side of ([5]) reduces to the Wyner-Ziv formula ([2]). □ 

Example 2: If t — 2, then reduces to 

#o(gM 2 ) = min { max I p (Jf; U 12 \Y t ) + E^, t/ 12 ) + I p (X; U 2 \Y 2 , U l2 ) } . (6) 

pe^(di,d 2 ) Ue{i,2} v 1 ' v 1 7 J 

One may invoke the Support Lemma [2, Pg. 310] to show that imposing the cardinality constraints 

1^1,2} I < |^| + 5, < \9£\ |%, 2} | + 1, and |^ {2} | < \3£\ |<% )2} | + 1, does not alter the 

minimization in (|6]). It can be shown, see Theorem [2[ that R (di,d 2 ) > i?(c/i, d 2 )- □ 

Example 3: If t = 3 and \^\ = \%\ = \%\ = 1, then @ reduces to 

R (di,d 2 ,d 3 )= min {/ P (X;C/ 123 ) + / P (X; C/ 12 |C/ 123 ) + I p (X;U 13 \U 123 ) 

+ I p (X; U 23 1 t/ 123 ) +/ P (X;[/ 1 | U 12 , U 13 , U l23 ) 
+ I p (X;U 2 \U 12 ,U 23 ,U 123 )+I p (X;U 3 \U l3 ,U 23 ,U 123 )] . (7) 
Suppose that !% = &i = & 2 = ^3 = {0,1,2}, and let X be uniform on . Finally, set 

{0, if x = x 
(8) 
1, otherwise, 

for I = 1, 2, 3 and require that d\ = d 2 = d 3 = 0. 

We now choose the following auxiliary random variables. Set 

%i,2} = %, 3 } = ^{2,3} = {0, 1, 2} , and (9a) 

l%}| = I ^{2} I = I ^{3} I = I %,2,3} I = 1 • (9b) 
Let C be independent of X and uniform on {0, 1, 2}. Using modulo-3 arithmetic, choose 

U 12 = C, U 13 = X + C, and U 23 = X + 2C . (9c) 
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Note, X can be written as a function of any pair of Ui 2 , U X3 and U 23 , and the Markov chain 
(Ui, U2, U 3 , U\2, U13, U23, C/123) -e- X -e- (Yi, Y 2 , Y 3 ) is trivially satisfied. It follows that these 
auxiliary random variables are defined by some p' E &(0, 0, 0). 
From ( [9b] ), it follows that ([7]) is bound from above by 

#o(0, 0, 0) < V (X; C/ 12 ) + I p , (X; U 13 ) + I p , (X; U 23 ) . (10) 



Furthermore, every mutual information term on the right hand side of ( fT0| ) is zero from ((9cJ). 
Since Rq(0, 0,0) is non-negative, it follows that _R (0,0,0) = 0; however, from Proposition [7] 
we have that R(0, 0, 0) = H(X) > 0. This counterexample demonstrates that i2 (d) is not an 
upper bound for R(d). □ 

It appears that this counterexample does not invalidate any results in the rate-distortion 
literature. In particular, those papers that cite [|5} Thm. 3] are either concerned with the special 
case of 2 decoders or stochastically degraded side-information. See, for example, 0, [|6j, [J7J, 
(9j, [ 10 1- The case of stochastically degraded side-information is discussed in the next section. 

When t = 3, we can force ([4]) to become an upper bound for R(di, d 2 , d 3 ) by modifying the 
set &>(di, d 2 , d 3 ) on which the minimization takes place. Namely, if we define 

„ , U 13 o (X, Una) e U12 \p] 

**(d u d 2 ,d 3 )^{pe&(d u d 2 ,d 3 ): v 123) 12 m , (11) 

C/23^(X,C/ 12 3)^([/ 12) [/ 13 ) [P] 



7 



then it can be shown that 

Rq (di ,d 2 ,d 3 ) = min > max I p (X; XJy. \s&2. , YA 

is an upper bound for R(di,d 2 , d 3 ). The additional Markov chains in ( [TTj ) are sufficient to verify, 
via classical random coding techniques, the admissibility of rates approaching R£(di, d 2 , d 3 ) 
from above. In general, this approach can be extended to t > 3 decoders by carefully choosing 
appropriate Markov chains for each of the (2* — l)-auxiliary random variables^ For example, 
if U is chosen to be degenerate (constant) whenever S^j is not of the form [/, t] for some 
/ G [t], then one obtains appropriate Markov chains and a valid upper bound for R(d). In 

5 In Section [iv| we will take a slightly more general approach wherein the mutual information terms in ([4jl — rather than the 
minimization set £P(d) - are modified to produce an upper bound for R(d). We would like to thank Dr. Chao Tian as well as 
an anonymous reviewer for suggesting this more general approach. 
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fact, this particular choice of auxiliary random variables is optimal when the side-information 
is stochastically degraded. 

C. Rate-Distortion with Degraded Side-Information 

The side-information, as defined by q, is said to be degraded if X ■& Y t ■& Y t -\ & ■ ■ ■ ■& Y\ [q] 
forms a Markov chain. The side-information q is said to be stochastically degraded if there 
exists a pmf g' on f x <3/* where X ■& Y t ■& Y t -\ e- • • • -e- Y± [q 1 ] forms a Markov chain and 
q(x,yi) = q'(x,yi) for every (x,yi) G & x $f and I G [t\. If &(d)[q] and M(d)[q'} are the 
respective d-admissible rate regions for q and q', then this condition and Proposition [3] ensures 
that ^(d)[g] = 3%{d)[q'}. Thus, it is sufficient to consider degraded side-information. 

When the side-information is degraded, R(d) can be characterised using t auxiliary random 
variables. These variables are Uu t ], Un t \, . . ., U{ t ], and the corresponding subsets of decoders 



are [l,i], [2,t], . . ., {£}. To formally define these variables using the notation of Section II-B 
choose | = 1 whenever ^ 7^ [l,t] for some / G [t], and let ^ de£; (d) denote the resultant 
set of p G ^ that satisfy properties (PI) and (P2). 

Proposition 8: If X -e-Y t -e- Y t -\ o- • • • -e- Y\ [q] forms a Markov chain, then 

i 

R(d) = min Vj p (X;?7 [M] |^,t/ M ,f/ M ,...,^_ M] ) , (12) 

pe.^ de9 (d) 

where the cardinality of each set ^r i)t i is bound by 

i%„i < w n - 1 +* - ' + (f '' +1) 2 (f '' +2) ■ 

The converse theorem for this result can be found on [5, Pgs. 733-734]. Note, however, that the 
use of Ro(d) in [5, Thm. 3] is incorrect. For example, the side-information used in Example ?? 
is trivially degraded. 

Finally, we note that the Markov chain X-e-Y t -e-Y t _i-e-- ■ -oY\ [q] appears to be essential for the 
converse theorem [5, Pgs. 733-734]. In contrast, the coding theorem that proves the admissibility 
of rates approaching ( fT2] ) is less dependent on this assumption. Indeed, this Markov chain can 

be disregarded provided there is an appropriate increase in rate. For example, the functional 

t 

min V] max I p (X; Un )t ]\Y lt , IW . . . , Un l t] ) 

P^ deg {d)^i'e[i,t] PV 1,11 1,1 1 ' J/ 
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is an upper bound for R(d). We will extend this idea in the next section to give an inner bound 
for 

III. Main Results for M(d) 

A. An Inner Bound for M(d) 

We now present a new inner bound for Midi). This bound will require an auxiliary random 
variable for each non-empty subset of decoders. For this purpose, arrange the non-empty subsets 
of [t] into an ordered list v = S^i, J^ 2 , • • • , ^2*-i w i m decreasing cardinality That is, |^| > 1=5^1 
whenever j < k. Let Y denote the set of all such lists. 

Fix «Gf. Let Qfc, <^> 2 , . . ., Ofr^ be finite alphabets and define %* = <%# x x <&> 2 x • • • 
x ^> 2t l . Let & v denote the set of all distributions on %*x&x whose ( JT x ^*)-marginal 
is equal to q; that is, p(x, yi,...,y v ) = q(x, yi, . . . , y„) . 

As before, each p G specifies a joint distribution for (2* — l)-auxiliary random variables. 
We denote these variables by Uy p j = 1, 2, . . . , 2* — 1, where Uy^ takes values from Let 
— i u s\, Uy 2 , . . . , Uy 2t i }, and define 

A |[/^. e ^ : z < j, ^ ^ J^} and 

^ A G ^ . ^ D j^ J _ 

We note that the union of and srfy. is the set of all those auxiliary random variables 
associated with subsets that appear before in v. Let us further define 

3^ fc g . 



E7>. Eji/Z : 3 > and 

J ^ n ^ fc ^ 

^J,^ A |^ e ^ . ^ a / j when / G J^- . 

Finally, let & v (d) denote the set of all p G £? v satisfying properties (PI) and (P2) from 
Section ( [TLB] ). 



Our inner bound for &(d) will be built using the following functional. For each subset JPj C [i] 
and / G [t] such that n [/] 7^ 0, let 

$ p (^-,/) 4/ p (X,^4;t^|^) - min / p (t^; ^j. J# ,^|^) . (13) 
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Finally, for each p G £P v (d), define)^ 



r G M' + : J> > ^ VZ G [t] 



i=l 



S>i c [t], 
•S*j n [I] ^ 



and let 



^ m (d)^co |J |J ^(d) 

where co(-) denotes the closure of the convex hull. 

Theorem 1: If d G W + , then every rate tuple within 3& in (d) is d-admissible; that is, 

M in {d) c <^(d) . 

Our proof of this result is given in Appendix |Aj 



B. Stochastically Degraded Side-Information 

Assuming that the side-information is stochastically degraded, Tian and Diggavi gave a single- 
letter characterisation of &(d) in [6, Thm. 1] (see Proposition [I]). We now show that the forward 
(coding) part of this result can be obtained as a special case of Theorem [TJ 

We can assume that X ■©■ Yf ■& !t_i ■©• • • • ■©• Y\ [q] forms a Markov chain. Recall ^degid) 



from Section II-C Each p G £?deg specifies a joint distribution for t non-degenerate auxiliary 
random variables. These variables are E/[i,t], C7[2,t]» • • ■> ^{t} an d the associated subsets are [1,£], 
[2, rj], . . ., {rj}, respectively. We can ignore the degenerate random variables in srf, so that for all 
/ G [1, t] we have 

**M = {U[i,t],U [2 , t ], . . . , / '/ :./ } , (14a) 



*M = ^ and 



j^, = V/'e [/,*]. 



(14b) 
(14c) 



6 One can invoke the Support Lemma [2 Pg.310] to upper bound the cardinality of each set $0^ . Note, these bounds will 
depend on the particular choice of list v. 
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On combining the Markov chain (Z7[i,t], C/[2,t], • • • , ^{t}) ^ -X" O'l, ^2> • • • , Y t ) [p] with the 
Markov chain X -e-Y t -e- Y t _\ ■e- • • • -e- Y\ [p], we obtain the following Markov chains: 



On substituting ( 14a), (14b) and ( 14c) into (13), we obtain 



$ p ([l,t},j) = I p [X;U M 



min I p [U]it\]Y v 



The second term on the right hand side of (16) can be rewritten as 



min I p [Un t] ;Y v 



%t] 



HJU> 



p\ u [i,t] 



H P [U M 



HJUi 



%t] 



max HJUn t] 

l'e[l,j) { ' 



p\ u [i,t] 



% t ] 



P,*] 



where (17) follows from the Markov chain (15), and (18) follows since 



HM 



P \ u [i,t] 



Mi 



jj, ,y)>h p (u M | , , y,) , vz' G [J, j] 

On combining ( |T6| ) and ( fT9] ), we get 

= I p (x, £/g t] ■ U[i ft ] J - Jp (t/ [i )t] ; F« , jtfjjjj 



(15) 



(16) 



(17) 

(18) 
(19) 



(20) 



From ( |14a[ ) and since C/p )t ] -e- (X, ^rp t i) ^ [p] forms a Markov chain, ( |20| ) further simplifies to 



$ p ([Z,t],j) =I P [X;U M U [uh U [2A ,...,U [l ^ t] ,Y l 



(21) 



Finally, substituting pT) into the definition of 3& PjV (d) proves the d-admissibility of every rate 
tuple r G for which there exists some p G ^ > deg (d) with 



z=l 



for j = l,2,...,t. 
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C. Side-Information Scalable Source Coding 

If t = 2 and the side-information is degraded (X -&Y 2 -e- Yi[q]), then an optimal compression 
strategy should satisfy the distortion constrains of decoder 2 after the distortion constraints of 



decoder 1 have been satisfied. See, for example, Section II-C However, this ordering may not be 
optimal when the side-information is not degraded. This observation led Tian and Diggavi in (7J 
Thm. 1] (see Proposition [2]) to propose and study the side-information scalable source coding 
problem. In the context of this paper, this problem is a special case of the successive-refinement 
problem where X -e- Y\ -e- Y 2 [q] is assumed to form a Markov chain. We now show that this 
result can be obtained as a special case of Theorem [T] 

Choose the list v as follows: SP\ = {1, 2}, y 2 = {1} and Y 3 = {2}. For each p e & v (dx, d 2 ), 



we have the chains X -e- Y\ -e- Y 2 [p] and (Ui 2 , Ui, U 2 ) -e- X -& (Yi, Y 2 ), therefore ( |T3| ) simplifies to 

2}, 1) = I P {X- U l2 ) - I p (U l2 - Fx) 
= I p (X;U 12 \Y 1 ) 
%{{1,2},2) = I P {X;U 12 ) -^min I P {U 12 ,Y V ) 

= I P (X;U 12 )-I P (U 12 ;Y 2 ) 

= I P (X;U 12 \Y 2 ) 
1) = I P {X- U X \U 12 ) - I p {U x] Yx\U 12 ) 

= I P (X;U 1 \U 12 ,Y 1 ) 
$ P ({1},2) =I P (X;U 1 \U 12} Y 1 ) 
%({2}, 2) = I P (X; U 2 \U 12 ) - I P (U 2] Y 2 \U 12 ) 

= I P (X;U 2 \U 12 ,Y 2 ) . 

On substituting these equalities into the definition of ^t,, p (<ii, <i 2 ), it can been seen from Theo- 
rem [Tjthat any rate pair (ri,r 2 ) satisfying 

r 1 >$ p ({l,2},l) + $ p ({l},l) 

= I P (X; U X2 \Y X ) + I P (X; U X \U 12 , Y{) 
= I P (X;U 1 ,U 12 \Y 1 ) , 
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and 

n + r 2 > $ P ({1, 2}, 2) + 2) + $,({2}, 2) 

= I P (X; U 12 \Y 2 ) + I P (X; U X \U X2 , Y x ) + I P (X; U 2 \U 12 , Y 2 ) 
= I p (X; U 2 , U X2 \Y 2 ) + I p (X; U^, Y 1 ) 

for some p e ^ v {d\,d 2 ) is d-admissible. This condition matches the desired inner bound j7j 
Thm. 1] (Proposition [2]). 

IV. Main Results for the Wyner-Ziv Problem with ^-Decoders 
A. An Upper Bound for R(d) 

Recall Figure [3] and the rate-distortion function R(d). 



Theorem 2: 



2 f -l 

R(d) < min V 

v e -r, -, 

3=1 

P G S»„(d) 



I p (X, ^ ; U yj l^g.) - min I p {U,y-^^Y v | ^ ) 



(22) 



We note the following special cases where this upper bound known to be tight. For one decoder, 
the right hand side of ( f22| ) gives the Wyner-Ziv formula ([2]). For t-decoders and degraded side- 



information, the right hand side of ( |22] ) is equal to the right hand side of ([12]). (Set = 1 

whenever J^j 7^ [l,t] for some I G [i], and following the reasoning given in Section III-B ) In 
fact, this upper bound is tight whenever X -e- Y ai -e- Yq, 2 -e- • • • -e- F at , where cty, / = 1, 2, . . . , t 
each take unique values from [t] (see Remark [2]). Most importantly, however, this bound avoids 
those problems suffered by Ro(d) in Example [3} 

B. Proof of Theorem [2] 

The following lemma will be useful for the proof of Theorem [2] 



Lemma 1: Suppose p e & v (d), and recall the functional $ P (J^, I) defined in < |T~3] ). For every 
^ C [t] and 1,1' e [t] such that n [/] ^ and ^ n [/'] ^ 0, we have: 

(i) < when /' > /, and 

(ii) $ p (^,o>o. 
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Proof: The fact that < follows because S"j H [I] C ^ n [/']. To see 

that $ p ( t 5^, /) > 0, consider the following. Let 

Z = argmin I p (Uy. ^YAsrf^ , 



then 



$ p , I) = I p (X, ; | - I p (Uy 3 ■ Yj\ 

= I p (X, tff* , Yf Uy j I ^ )-I p {Uy- ^ - v Yj\ ) 

= I p (X, ^,^,Y f Uy.) - I p (Uy.;^.,^ p Yj) 
> , 



(23) 
(24) 
(25) 



where ([23]) follows because Yj -e- (X, g/j^., g/y.) -e- Uy. [p] forms a Markov chain, ( [24] ) follows 



from the chain rule for mutual information, and ([25]) follows from 



D 



We now prove Theorem [2] First, note that the minimum on the right hand side of ( [22] ) exists. 
Suppose that v and p achieve this minimum, and choose any r£R + such that 

2*-l r n 

r>]T I p (X,^.;^^)-mm^^^ . (26) 

i=1 L J 

In the following, we prove the d-admissibility of r using Theorem [T] 

Consider the successive refinement problem shown in Figure[4} the corresponding d-admissible 



rate region (defined in Section II-A), and the inner bound 3%i n (&) given in Theorem [T] 

In particular, consider the region & p>v (d), where v and p achieve the aforementioned minimum. 
Define the t-tuple f = (r, 0, 0, . . . , 0). It is clear that r > R(d) iff f G therefore the result 

will follow if it can be shown that f G M PjV (&). 
For every I G [t], we have 

l 2*-l r 

J> > E h^^Uy^) - min I p (Uy.-^ X y,^Y v \^ 



1=1 



3=1 



J^C[t] 

^ 3 C [t], 
■5", n [!] ^ 



(27) 
(28) 
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> , (29) 



S>j n [l] ^ l 



where ( |27] ) follows from ( fT3| ), and Lemma [T] gives ( [28] ) and ([29]). From Theorem [T] we have that 



f G ^, p (d) and f e f (d), therefore r > R(d). □ 
Remark 1: Theorem [2] is a consequence of the inner bound 3& in (d) given in Theorem [T] 
Like &(d), Mi n {d) depends on the successive-refinement decoding order: if we interchange the 
decoders (keeping the same side-information and distortion constraints at each decoder), then 
the resulting inner bound ffl in (d) will change. One might, therefore, be inspired to pursue a 
stronger version of Theorem [2] wherein the choice of successive-refinement order is optimized. 
Note, however, that the proof of Theorem [2] requires only the bound for r% + r 2 + • • • + r t in 
M P)V (d), and this bound is independent of the successive-refinement decoding order. 

V. Lossless Source Coding with Private Messages 

In Proposition [7J we reviewed a broadcast problem wherein X is reconstructed losslessly at 
every decoder. This lossless problem can be easily solved as a variant of existing work by Slepian 
and Wolf JT7| Thm. 2]; Sgarro (§ Thm. 2]; or Bakshi and Effros (TS| Thm. 1]. In this section, 



we consider a more complex scenario wherein each decoder is required to decode one part of 
X losslessly. 

Let W\, W-i, . . .,W t be finite alphabets, and consider the problem shown Figure [5] In the nomen- 
clature of previous sections, set % = #i x W 2 x • • • x W t , X = (Wi, W 2 , ■ ■ ■ , W t ), and let 
(Wi, W 2 , . . ., W t , Yi, W 2 , . . ., W t ) be drawn iid according to q{w 1 ,w 2} ■ ■ .,Wt,y u y 2 , ■ ■ .,y t ). 
It is required that decoder / reconstructs W; with vanishing probability of symbol error. To this 
end, set Sfci = Wi and define the average symbol error probability at decoder / to be 

n 

pi A _ \^ pi 

where 4 E[5i(Wt, h W 2>i , • • • , W t>i , W lti )], 

A . 0, if wi = wi 
5i(w 1 ,w 2 , ...,w t ,wi) = { (30) 

1, otherwise, 
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Wi.Wj,.. 




Fig. 5. Lossless Source Coding with Private Messages. The encoder compresses (Wi, Wj, . . . , Wt) to (Mi, M2, . . . , Mt). 
It is required that decoder / uses Mi through Mi together with Y; to produce a lossless replica W; of W; . In Theorem [3] we 
give an explicit characterisation of the ?%(0, 0, . . . , 0) for degraded side-information (Wi, W2, ■ ■ ■ , Wt) -& Yt ■& Yt-i «■ • • ■ 
e- Yi. 



defines the probability of error for the i th -symbol. 

A computable characterisation of 3?(0, 0, . . . , 0) has yet to be found. A direct application of 
Theorem [T] yields an inner bound for M(0, 0, . . . , 0); however, it is not clear if this bound is 
tight. The next theorem shows that this bound is tight when the side-information is degraded. 
Although this result is a special case of Proposition [TJ we state it here in an explicit form - 
without auxiliary random variables - to highlight the generality of this problem. 



Theorem 3: If (Wi, W 2 , . . . , W t ) e- Y t & & ■ ■ ■ & Y x [q] and 5 t is given by <[30j), then 

( i i 

0(0, 0, . . . , 0) = It e M f + : ^ r k > ^ H(W k \ W u W 2 , . . . , W k ^, Y k ) 



k=l k=l 



The lossless one-channel version of Theorem [3] follows immediately. 



Corollary 3.1: If (W h W 2 , W t ) & Y t e Y t _ x & ■ ■ ■ & Y x [q] and 6 t is given by Q, 
then 

t 

R{0, 0,...,0) = Y,H{W l \W 1 ,W 2 ,..., W^Y) . 



i=i 
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Remark 2: The lossless problems considered in this section are equivalent to the concept 
of deterministic distortion measures [jvj , p9| , wherein certain functions {(fii(X)} of the source 
X are to be reconstructed with vanishing symbol error probability at the receivers. If t = 2, 
Zi = 4>\(X) is to be reconstructed at receiver 1, Z 2 = 02 PO is to be reconstructed at receiver 
2, and the side-information is reversibly degraded (i.e. X •e- Y\ -e- Y 2 [q] forms a Markov chain), 
then Tian and Diggavi have shown that [|7J Cor. 4] 

R(0,0) = H(Z 2 \Y 2 ) + H(Z 1 \Y ll Z 2 ) . 



This result is consistent with Corollary 3.1 in the following sense. The achievability of Corol- 
lary 3.1 follows from Theorem [2] by setting = Wi whenever = [l,t] for some I E [t] and 
Uy. = constant otherwise. The bound in Theorem [2] is equal to the rate-distortion function R(d) 
for every order of degraded side-information. For example, suppose that X -e- Y at -e- Y at _ 1 -e- ■ • ■ 
-e- Y ai [q] forms a Markov chain, where ai, I — 1, 2, . . . , t each take unique values from [t\. This 
markov condition is simply a relabelling of the degradedness considered in Section II-C[ so it is 



appropriate to choose the t non-trivial auxiliary random variables to be Z7[ ai , ai ], U[ a2m ], . . ., U{ at y, 



where [an, a t ] = . . . , a t }. Thus, we can set U[ auat ] = W ai to restate Corollary 3.1 for 

an arbitrary order of degraded side-information. 

Tian and Diggavi also characterise the successive-refinement region 3?(0, 0) in fj\ Thm. 4] 
for t = 2 and reversibly degraded side-information. This result is not captured by Theorem |3j 
and it would be interesting to see if a similar result can be obtained for t-receivers and arbitrary 
ordering of degraded side-information. 

Proof: The forward (coding) part follows from by setting Ui = Wi in Proposition [T] The 
converse theorem requires some work and is given below. For brevity, we use the following 
notation: M< ; 4 {M ls M 2 , . . . , M z }, W<, 4 {W b W 2 , . . . , W,} and Y< z 4 {Y 1? Y 2 , . . . , Y}. 
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By definition, we have 

/ , i 



k=l 



fc=l 

> -H(M <1 ) 
1 



> -/(W^Y^M^) 

1 - 

fc=i 
1 - 

> -^/(W^M^W^Y^) 



fe=i 
i 



> 
fc=i 

fc=l 



[^(WfclW^.Y^) -ff(W fc |M<,,W< fc _i,Y< fc ) 

fc=i 

. i n 

[ H {W k ,i\W< k ^,Y< k , W k)1 , W k , 2 , W k ^ x ) 

k=l 1=1 

- (W M |M<,, W< fc _!, Y< fc , W kjl , W k>2 , W k) i-i) 

l - £ 71 

£if (W fc |Wi, W 2 , . . . , W fc _ a , Y fc ) - - 5lH(W kti \W k J 

k=l i=l 
1 £ n 

>^-i^)--££[M^)+n fe >g 

fe=i i=i 

> ^ i? (W fc | W ls W 2) . . . , W k -1, Y k ) - [h{ P e) + P e ^2 W 



2 \"k 



k=l 
t 

>Y,H(W l \W l ,W 2 ,..., 

i=i 



k=l 



W^Yt)- lh(e) + e\og 2 \W 1 \ \W 2 



(31) 

(32) 
(33) 

(34) 
(35) 
(36) 



(37) 
(38) 

(39) 

(40) 

(41) 



where ( pT| ) through ( [37] ) follow from standard Shannon inequalities; ( [38] ) follows because (Wi, 
W t , Yx, Y t ) is iid, W k & (W u W 2 , W k - lt Y k ) o {Y 1 , Y 2 , . . ., F fe -i) forms a 
Markov chain; conditioning reduces entropy and W ki i is a function of M 1; M 2 , . . . , M fc and Y fc ; 



((39]) follows from Fano's Inequality where h(-) is the binary entropy function [3|; (j40j) follows 
from the concavity of h(-) and Jensen's inequality; ( |4T| ) follows by assuming e is small (i.e. 
< P e fc < e < 1/2). Finally, / h(e) + elog 2 • • • \W X \ ->■ as e ->■ 0. ■ 
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VI. Conclusion 

We studied the rate-distortion function R(d) and the rate region &(d) for the problems shown 
in Figures [3] and |4} respectively. In (5J Thm. 2], Heegard and Berger claimed that a certain 
functional, Ro(d), is an upper bound for R(d). By way of a counterexample, we demonstrated 
that Ro(d) is not an upper bound for R(d). In Theorem [2| we gave a new upper bound for 
R(d). This bound followed from a new inner bound for M(d) that we presented in Theorem [T] 
Finally, we gave an explicit characterisation of the rates needed to losslessly reconstruct private 
messages at each decoder (assuming degraded side-information) in Theorem |3j 
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Appendix A 
Proof of Theorem [T] 

Fix v E Y andp £ & v (d) arbitrarily. It is sufficient to prove the d-admissibility of rate tuples 
within & p>v (d). (The d-admissibility of tuples within & in (d) follows by standard time-sharing 
arguments.) Our proof uses a random-coding argument that is based on the concept of e-letter 
typical sequences^] This argument employs (2* — l)-randomly generate codebooks; one codebook 
for every non-empty subset of receivers. The encoder selects a codeword from each codebook 
and sends some information (the bin indices of each codeword) to the decoders. Each decoder 
tries to recover those codewords where it is a member of the corresponding subset. To help 
elucidate the main ideas of the proof, we present the special case of four decoders as a series 
of examples in parallel to the main proof. 

For notational convenience, we impose the natural ordering on the elements of each subset 
S?j, and we let ^[i] denote the i th - smallest element of S^j. For example, if = {1, 3, 5}, then 
^■[1] = 1, J^-[2] = 3 and J^-[3] = 5. 

7 We have reviewed the relevant e-letter typical results in Appendix |b] for convenience; a more detailed treatment can be found 
in |20j. 
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A. Code Construction 

For each subset construct an \S?j\ -layer nested codebook in the following manner. For 
each vector- valued index ky. = (fcy^i, ky jj2 , ■ ■ ■ , ky.^y.^, k' y .), where 

/■•a, 1-2 2" /l '' , z = l,2,.. .,|^-| , 

k' yj =l,2,...X R '^ , 

generate a length n codeword uy.fey.) G by selecting n symbols from in an iid 
manner using p(w^) - the -marginal of p. The values of Ry 3) i and i?^. will be defined 
shortly. 

Example 4 (A-Decoders Code Construction): Choose the list v as follows: = {1,2,3,4}, 

y 2 = {1,2,3}, y 3 = {1,2,4}, y 4 = {i,3,4}, y, = {2,3,4}, y 6 = {1,2}, y 7 = {1,3}, 

y 8 = {1,4}, ^ 9 = {2,3}, y w = {2,4}, ^ n = {3,4}, y l2 = {1}, y 13 = {2}, ^ 14 = {3} and 
=5^X5 = {4}. Figure [6] shows the 3-layer nested codebook associated with the subset {1,2,3}. 
In the first layer, there are 2 ni?1231 bins (labelled with the index ^123,1) each of which contain 
2 n (- R 'i23+- R i23, 2+-R123, 3) codewords. The set of codewords inside a particular layer one bin define 
the second layer of the codebook. Specifically, each layer one index fci.23,1 G [2 7l/?123 > 1 ] identifies 
2 ni?123 - 2 layer two bins. These bins are labelled with the index ^123,2, and each bin contains 
2^23+^123,3) codewords. Similarly, each pair k 123 ,i G [2 ni?123 - 1 ] and £423,2 G [2 nRl23 > 2 ] identifies 
2 nRl23 ' 3 layer three bins. There are 2 n ( R ' 12 ^ codewords in each one of the layer three bins. 

B. Encoding 

Encoding proceeds sequentially over (2* — l)-stages using e-letter typical-set encoding rules. 
For this purpose, choose < e < e x < ■ ■ ■ < e 2 < to be arbitrarily small real numbers. The 
encoder is given x G 3£ n . At encoding stage j it selects the codebook with label and looks 
for an index vector k^. where the corresponding codeword uy. (k^.) is e-,-letter typical with x 
and 

= {u ,(k,.) : i < j, y, D Yj}, and (42a) 

uL = {u^(k^) : 2 < j, ft t y h 3j?i,, i' > j, y v n ^ ^ 0, ^ n ^ ^ 0} . (42b) 
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Fig. 6. 4-Decoders Example: Figure shows the nested bin structure for the codebook ,5^2 = {1,2,3}. 



If successfu|°j ^ e enc °d er sen( ls the bin index k y j ^ over channel [i] for every % — 1 , 2, . . . , \S?j \ . 
If unsuccessful, the encoder sends ky h i = 1 over each of these channels. 

Note the correspondence between the sets u^,. and u y , and the sets of auxiliary random 
variables and ^(J. , respectively. Finally, note that when \SP$\ > 3, then u y _ U ut^ = 
{u^kyj : i < j}; that is, the encoder chooses uy.(ky.) to be jointly typical with every 
codeword it has previously selected. The situation is more complex when 1=5^1 < 2. 

Example 5 (A-Decoders Encoding): Table |lj lists the fifteen encoding sets ir^. and ut^ and 
Figure [7] depicts the index to channel assignments for the four decoder example. In stage 1, 

8 If there are two-or-more such codewords, we assume that the encoder selects one codeword arbitrarily and sends the 
corresponding indices. 



April 20, 2010 



DRAFT 



29 



Subset 








&x = {1,2, 3, 4} 











^2 = {1,2, 3} 


{u^i} 








= {1,2,4} 


{u^i} 


{U^ 2 } 


= { u ^2> 

"k,2 = { U ^} 


^4 = {1,3,4} 


{u^} 


{u^ 2 ,u^. 3 } 


= { u ^2."^ 3 } 

"k,3 = { U ^2> 
"k,4 = { U ^3> 


^5 = {2,3,4} 


{u^J 


{u^. 2 ,u^ 3 ,u^ 4 } 


6 k,2 = { U ^2."^ 3 } 
"k,3 = {U^ 2 ,U^ 4 } 
6 k,4 = { U ^3. U ^4} 


^6 = {1,2} 


{11^,115^,11^3} 


{u^ 4 ,u^. 5 } 


"k,2 = { U ^5> 


^7 = {1,3} 


{11^,11.5^,11,5^} 


{u^ 3 ,u^ 5 ,u^ 6 } 


"y 7 ,i = { u ^3, u ^6} 

"k,3 = { U ^5> 


^8 ={1,4} 


{ll^U^U.^} 


{u^ 2 ,u^ 5 ,u^. 6 ,u^ 7 } 




•5*9 = {2,3} 


{u^u^u^J 


{Uj^ 3 , u^ 4 , Uj^ 6 , us* 7 , 


"k,2 = { u ^3. u ^6} 
"k,3 = {UJ^ 4 ,U^ 7 } 


^io = {2,4} 


{u^.U^^^} 


{uj^ 2 , Uj7 4 , Uj^ 6 , U,y 7 , 


fi ko,2 = { U ^2,U^g,U^ 9 } 
"ko,4 = { U ^4- U ^ S } 


={3,4} 


{uj^Uj^u^J 


{u^ 2 ,u^3,u^ 7 ,u^ 8 , 


"kl,3 = { U ^2> U ^7, U ^ 9 } 
"kl,4 = { U ^ 3 > U ^s,U^lo} 


^12 = {1} 


{11,^,11.5^,11,503,11,5^, 








^13 = {2} 


{U^l , U J^2 > U ^3 > U -5"s ' 

Uj^ 6 , U,y 9 , U.y 10 } 








^14 = {3} 










^15 = {4} 


{U^ 1; U^ 3 ,U^ 4 ,U^ 5 , 









TABLE I 

The table lists the fifteen encoding SETS U ™ AND uL as well as the decoding sets u-L and ui, for the 

FOUR DECODER EXAMPLE. 
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ky,,\, ky 3 ,i, ky ul 



ky 6 ,i, ky TtU ky Bi i, ky 12>1 



ky u 2, ky 2t2 , ky 3t2 , ky 5yl 



ky e ,2, ky ,i, ky 10il , ky 13} i 



ill 



X 2 



X 



/i,3> ky 2 ,3, ky,;2, ky^ 2 



7' 7 ,2, ky ,2, ky lu i, ky 14yl 



ky^i, ky 3 ,3, ky l} 3, ky 5}3 



44 



ky S} 2, ky 1(h2 , ky m ,2, ky l:i ,i 



|Y 4 



X, 



x 4 



Fig. 7. 4-Decoders Example: Assignment of bin indices to channels. Subsets are ,5^\ = {1,2,3,4}, ,5^2 = {1,2,3}, 5^z = 
{1,2, 4}, ^ 4 = {1,3, 4}, ^ 5 = {2, 3, 4}, ^ 8 = {1, 2}, J^ 7 = {1, 3}, ,y s = {1,4}, #9 = {2,3}, = {2,4}, S> u = {3,4}, 
.5*12 = {1}, J?i3 = {2}, Yi4 = {3} and ,y 15 = {4}. Bin index ks> jt i is sent over channel J5^[i]. 



the encoder considers subset 5?\ and looks for an index vector such that the corresponding 
codeword (k y 1 ) is jointly typical with x. (The sets and ut^ are empty - see Table |lj) The 
resulting indices ky lt i, ky lt 2, ky lt3 and ky lt 4 are sent over channels 1, 2, 3 and 4, respectively. 
In the eleventh encoding stage, takes the codebook for S?\\ = {3, 4} and looks for a index vector 
= (ky llt i, ky llt 2, k'y^) such that the corresponding codeword u^ 11 (k^ 11 ) is jointly typical 
with x, uy^kyj through to uy 5 (ky 5 ) and uy 7 (k,y 7 ) through to uy 10 (ky 10 ). (Note, that this 
codeword need not be jointly typical with uy 6 (ky a ).) The resulting indices ky n ,i, ky llt 2 are 
sent over channels 3 and 4, respectively. 

C. Decoding 

Consider decoder /. Like the encoding procedure, decoder / forms its reconstruction X; of 
X using (2* — l)-decoding stages. Recall, this decoder recovers every bin index transmitted on 
channels 1 through I; it does not have access to any index transmitted on channels I + 1 through 
t. 
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In stage j decoder I considers subset S^j. If I £ then it does nothing and moves to 
decoding stage j + 1. If I e S^j, then the decoder forms a reconstruction uy^ky.) of the 
codeword \\y i (ky?), which was selected by the encoder, using the following procedure. Note, 
decoder I will have reconstructed the following codewords in decoding stages 1 through j — 1: 

■ {u/fk/):/ - ./. >./,-}. and 



u 



u 



X 



(43a) 
(43b) 



Note the correspondence between the decoding sets in (43) and the sets of auxiliary random 



variables 



and 



To form its reconstruction uy.(ky.), decoder / takes the bin indices 

{/■•/,.,: i = l,2,...,|[Z]n^|} 
from channels 1 through I. It then looks for an index vector ky , with k 



•Ti- 



ki y. i for all 



1,2,.. 



PI S?j\, such that the corresponding codeword u(ky .) is e J+ i-letter typical with 



y^ as well as the codewords in ((43]) that were decoded in the first (j — 1) -stages: 



(44) 



Note that there are 



expo n | /.'V + ^ />' / . 

t=[[J]n^-|+i 

codewords in the bin specified by the indices \ky h % '■ i = 1, 2, . . . , \ [l] D If one or more of 
these codewords satisfy this typicality condition, then decoder / selects one arbitrarily and sets 
ky. = k^.. If there is no such codeword, it sets each of the unknown indices equal to 1. 

Example 6 (A-Decoders Decoding): Consider the second decoder (I = 2). In stage one, take 
ky u x (from channel 1) and ky lj2 (from channel 2) and look for a vector ky 1 = (ky u i, ky 1)2 , 
ky u 3, k'yj such that the corresponding codeword u^kyj is typical with y 2 . Similarly, in stage 
nine take ky Qt i (from channel 2) and look for ky g = (ky 9> i, ky g>2 , ky) such that the correspond- 
ing codeword Uy g (ky g ) is jointly typical with y 2 and uy^kyj, uy 2 (ky 2 ), uy 3 (ky 3 ), uy 5 (ky 5 ) 
and uy 6 (l&y 6 ), which were decoded during stages one through six. Finally, in stage thirteen take 
ky 13t i (from channel 2) and look for ky 13 = (ky 12t i, ky ) such that the corresponding codeword 
u ^i 3 (k^ 6 ) is jointly typical with y 2 and uy^kyj, uy 2 (ky 2 ), \iy 3 (ky 3 ), uy 5 (ky 5 ), uy 6 (ky 6 ), 
Uy g (ky g ) and uy 10 (ky 10 ), which were decoded during stages one through ten. 
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D. Error Analysis: Encoding 

The coding scheme is based on e-letter typical set encoding and decoding techniques. As such, 
the distortion criteria at each decoder will not be satisfied when (x,y 1 ,y 2 , . . . ,y t ) ^ Ti (p). 
We denote this event by E\. From Lemma [2} the probability of this event may be bound by 

Vx[E 1 \<S 1 {n,€ Q , l i{p)) , 

where 8\ (n, e ,//(p)) — ^0 as n — > oo. 

Assume E 1 does not occur. Let E 2 ^ denote the event that the encoder fails to find an e, -letter 
typical codeword during stage j of encoding procedure given that it found an e-letter typical 
codeword for every stage i e \j — 1]. From Lemma [3] and the inequality (1 — x)* < e~ tx we have 



Pr 



(u/.U/.U.ik/ l.x).; 7; : ";(/,) 



< 



exp 



(1 -<5 2 )2 n (^ +E ^' R ^' 1 ) . 2 ~H /( " V ' V' V: ' / -'" 2 '' //( ' x < 



(45) 



where we have written the function 5 2 (n, €j_i, e^, as <5 2 for compact representation. 

Let E 2 denote the event where a typical codeword cannot be found at any one of the encoding 
stages. By the union bound we get the following upper bound for Pr[i? 2 ]: 

Pr [Jgy < £ exp [ - (1 - 5 2 )2 n K +E - . ^ 

Finally, note that if 



i=l 



(46) 



for every j = 1, 2, . . . , 2* — 1, then Pr[.E 2 ] — > as n — > oo. 



£. Error Analysis: Decoding 

Assume £i and E 2 do not occur. Consider decoder I and a non- trivial decoding stage j where 
Sfj 3 I. Let Di y. be the event that it cannot find a unique codeword that satisfies the typicality 
condition ([44]) given that at every stage % < j (where 3 I) it found a unique codeword u(kj^) 



satisfying this typicality condition. 
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By the Markov lemma (Lemma 4), the probability that the codewords \iy.(k,y.), iiy., Uy,. z 
are not jointly typical with yi is small for large n: 



Pr 

An upper bound for the probability that there exists one or more codewords u^.(k^».) 7^ 
Uy.(ky ), which satisfy (j44j), is 



Pr 



U{^.^y'.^M eI Si(p) 

1^1 



< exp 2 i 

»=|p]n^|+l 
where we have taken the union over 



, (47) 



Xj = |ky, ^ k Yj , {k 
Applying the union bound we get 



_ , ,|{iA...,i}r\?S| 
— ^,,1/1=1 



} 



Pr [Di rYj ] <5 2 + exp 2 
Thus, if 



n 



i=\[l]nyj\ 



i=|[«]n.yj|+i 

then PrfDj^] — > as n — > 00. 
Constraints 



(48) 



Consider decoder / and any subset J?j where I E S^j. On combining the rate constraints <j46J) 



and (48) we get 



Ry,i > l{X^^Us)-l{U^^^Y^ 

i=l 

= l(X,j^. ] U s .\j*? j ) -I^y.-^Y^.) . (49) 
Since ej and e 3 -+i may be selected arbitrarily small, we can ignore the 2(ej + €j+i)H(S fi j) term. 
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Consider the other decoders in [I] n S^j. Since Ry 3) i > for all i, it must be true that 



\[i]n?\ 



l(X,^;Uy. .</;) - min /(/', :.</; Y) .</; ) 



(50) 



that is, the rate constraint for decoder I must be at least as large as the rate constraint for decoder 
I (for every I G [/] n S*j). 

The rate constraint ( f50| ) is valid for any set where I G -Yj. For such subsets, define 



I* = maxie^n^ i. Since /* G J^- and [/*] D J^- = [/] n it follows that (|50j) is also valid for 
any set ^ where [Z] n ^ 0. 

Finally, consider the sum rate Yn=i Rt f° r me first I channels. By construction, we have that 

i IMn.ni 

(51) 



Substituting the rate constraint (50) into (pTj) yields the desired result 



Appendix B 

e-LETTER TYPICALITY 
For e > 0, a sequence x n G JT n is said to be e-letter typical with respect to a discrete 
memoryless source (S£ ,px) if 

1 



n 



-N(a\x r 



px{a) 



< e ■ px(a) Va G 



where N(a\x n ) is the number of times the letter a occurs in the sequence x n . The collection of 
all e-letter typical sequences is denoted by Te U \px). 

In a similar fashion, a pair of sequences x n and y n are said to jointly e-letter typical with 
respect to a discrete memoryless two source (3£ x &,pxy) if 
1 



77 



-iV(a,6|o: n ,7/ n )-pxy(a,6) 



<e-pxr(a,6) V(a, 6) G x & , 



where iV(a, 6|x n , y n ) is the number of times the pair of letters (a, b) occurs in the pair (x n , y Ti 
The collection of all joint e-typical sequence pairs is denoted by Te n \pxy)- 
Given (Jf x <3f ,Pxy) and x n G Jf n , the set 

T e (n) (pxy I x n ) = {y n : (x w ,y")GTW(p xy )} 
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is called the set of conditionally e-letter typical sequences. 
Let fi(^,p x ) = mm{p x (x) : x G support(px)} and define 

5 1 (n,e,n(p x )) = 2| Jf| • e~ n ^^ Vx \ 

Note, 5(n,e, fi(px)) — > as n — > oo. 

Lemma 2 (Theorem 1.1, /[20]/j: Suppose X n is emitted by a discrete memoryless source ($£,px). 
If < e < nipx), then 

1 - 6 1 (n, e, nip x )) < Pr [X n G T^\p x )] < 1 . 

Now consider a discrete memoryless two-source (JT x £^,pxy), let 

^(n^!, e 2 ,Mpxy)) =21^11^1 -e"" 1 ^^^, 

and note that 5 2 (n, ei, e 2 , n(px)) — > as n — > oo. 

Lemma 3 (Theorem 1.3, [20]): Suppose F n is emitted by (^,py) where py is equal to the 
F-marginal of Pxy- If < e x < e 2 < [i(pxy) and x n G T^\p x ), then 

(1 - 5 2 (n, e l5 e 2 , /x(p X y))) 2"^W^» 

< Pr [r* G (pxy | ar n )] < 2-»W* Y >- 26 »*C i '>>. 

Finally, a direct consequence of Lemma [3] for Markov sources is the following result. 

Lemma 4 (Markov Lemma UpO]/): Suppose (X n , Y n , Z n ) is emitted by a discrete memoryless 
three-source {X x <3S x <3T,pxYz) where X -e- Y -& Z. If < e x < e 2 < fi(pxYz) and (x n ,y n ) G 
t£\ Pxy ), then 

Pr [Z» G (pxyz I x", j/") | Y n = y n ] 

= Pr [Z n G (pxyz | x n , y n ) \ X n = x n , Y n = y n ] 
>l-5 2 (n,e 1 ,e 2 ,n(pxYz)) ■ 
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